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SECTION I 

The derivation of the Law of Probability may be found in any text 
on the subject. Here we shall assume its validity and use it to obtain 
the several quantities which serve as criteria in statistical calculations. 

In the fundamental equation 

y = ke-h*x* [1] 

there are two characteristic constants, k and h, whose numerical values 
must be known for a given set of data before we can proceed with any 
calculations. A simple and at the same time exact method of obtaining 
the numerical values for those constants forms the subject of this paper. 
Since y equals k when x equals zero, k is the probability of an error 
zero and will therefore be defined here as the largest number of measure- 
ments of a given set having the same numerical value; while y will 
denote any number of measurements whose group value ranges from 
zero to the group value of the number of measurements denoted by 
k or y . Equation (1) then becomes 

JL =e -h i x> [21 

Vo L J 

which by means of logarithms we have transformed into a linear equa- 
tion, 

Log (2.303 Log -^ ) = 2 Log x + 2 Log h [3] 

or 

Log (Log — ) = 2 Log x + 2 Log /i - 0.3623 [4] 

Collecting 2 Log h and —0.3623 into one constant, we have, 

Log(Log|°)=2Loga; + K [5] 

'From the Division of Soil Chemistry and Bacteriology, College of Agriculture. 
University of California, Berkeley. 
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If we now plot Log (Log y — Log y) as ordinate and Log x as abscissa 
with a slope of 2 all the measurements should theoretically fall on the 
straight line, provided the data are susceptible to statistical interpreta- 
tion — that is, provided they are truly chance data. Practically, how- 
ever, even such data fall on either side of the straight line. Drawing 
now the "best" straight line with a slope of 2 through these points, we 
can then read off the values on the line as accurately as we choose, 
depending upon the size of the scale of plotting, and construct a "theo- 
retical" frequency curve for comparison with the experimental frequency 
curve obtained in the usual way; that is, by plotting the number of 
experiments in groups or classes against the measured values. Fre- 
quently y does not fall directly over the arithmetical mean. In such 
a case the theoretical polygon may be shifted to the left or to the right, 
and this corresponds to the parallel shifting of the straight line from 
which the values for the construction of the theoretical frequency poly- 
gon have been obtained. Often this theoretical polygon reveals the 
fact that the arithmetical mean calculated from the raw" data is not 
in all cases the "best" mean, for, as it frequently happens, one or two 
abnormal values will vitiate the mean considerably, especially if the 
number of experiments are not sufficiently large. We must, therefore, 
so superpose the two polygons as to make their areas approximately 
equivalent, since, as will be shown later, the areas play an important 
part in the calculation of the probable error. A concrete example will 
best illustrate the method of procedure. 

In a recent paper by Waynick and Sharp (1919) are given the nitrogen 
contents of a hundred samples of a local soil. The results are recorded 
to . 001%, based upon ten gram samples, and therefore to . 1 mg. In 
figure I these one hundred results are mapped in groups or classes . 1 mg. 
apart, the circles indicating the number of determinations falling into 
each class. 2 Plotting these classes vertically to a scale of one-half inch 
per one determination, we obtain the multimodal curve drawn immed- 
iately above the circles. Evidently the analyses are too fine as com- 
pared with the variability of nitrogen in those samples of soil. The 
number of determinations were then grouped in classes 0.5 mg. apart, 
resulting in the next curve above. This curve bears some resemblance 
to a "frequency" curve, but is still unsatisfactory. However, such 
a curve is quite sufficient for the construction of a theoretical frequency 
polygon by our straight line method. In this case j/o would equal 25 
and could be made to fall directly over the arithmetical mean, 10 . mg. 

2 When the circles fall on a line dividing two classes, then if the number of circles 
is even they are equally divided between the two classes; if odd, the extra one is put 
into jthat class which helps to make the experimental polygon most symmetrical. 
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Indeed, if we attempt to plot these data in classes very much farther 
apart than 0.5 mg., say 2.0 mg., we obtain a so-called skew curve, and, 
finally, we may obtain a line sloping in one direction only when we plot 
these data in classes 2 . 5 mg. apart. It is evident, therefore, that such 
skew curves are meaningless. 3 In the present case when we plot the 
data in classes 1.0 mg. apart the curve "skews" but slightly. Here 
y falls directly over the arithmetical mean, and the one hundred deter- 
minations fall into four classes. With these four points on the curve, 
two on each side of the mean and approximately equidistant from it 
we may construct the straight line as shown in figures VII, VIII and 
IX, where the values for Log (Log y — Log y) are plotted as ordinates 
and the values for Log x as abscissae, x denoting the residuals on either 
side of the mean without regard to algebraic sign. It should be noted 
that in drawing the "best" straight line with the theoretical slope of 2 
through such points proportionately less weight must be given to points 
taken from the experimental polygon near the base than to those taken 
from the upper portion of the curve. A little practice will soon enable 
one to judge at a glance which points are most significant. Having now 
obtained the "best" straight line, we may calculate any number of 
values for x by means of equation (5), namely: 

Log (Log 2-°). = 2 Logs + K, 

K denoting the distance on the Log (Log y — Log y) axis, or ordinate, 
from the origin to the point of its intersection by the "best" straight 
line. In the present example j/o equals 40 and y may be taken anywhere 
from one to thirty-nine, but for the construction of the theoretical 
polygon six to ten values for y will suffice. These are shown in table I. 

Discussion of the Figures 

Figure I has been fully discussed. Figure II is but another example 
of how to construct a theoretical polygon approximately equivalent in 
area to the experimental polygon. An interesting set of data is that 
mapped in figure III. Here the total nitrogen in each sample is so 
small that a few samples might have contained no measurable amount of 
nitrogen at all. The values for the construction of these two figures, 
II and III, were taken from a paper by Waynick (1918). 

The data mapped in figure IV are recorded in a paper by Batchelor 
and Reed (1918) . Here as in figure III the theoretical polygon indicates 
that among the one thousand orange trees about three might have borne 

3 A discussion of truly abnormal curves and their susceptibility to statistical 
interpretation will be given in another paper. See also section II of this paper. 
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no fruit at all had they been left wholly to chance. In fact one tree 
yielded but five pounds of fruit, which is practically zero, while another 
yielded 341 pounds, the mean of all the thousand trees being 137.6 
pounds of fruit. Two more interesting sets of data are those of Wood 
(1910) on the dry weights of mangel roots, and by Collins (1912) on 
butter fat. These results are mapped in figures V and VI. In figures 
VII, VIII and IX are shown the construction of the straight lines from 
the experimental data as previously described. Finally, in figure X 
are mapped the results of bacterial counts taken from a recent article 
in Science (1920). 

Calculation of the Index of Precision 

Turning once more to the straight line plots on figures VII, VIII 
and IX, we see that we may read off the values for K of equation (5) 
to any degree of accuracy, depending upon the size of the scale of the 
plot. On the above plots, 20x20 inches, the values for K can be read 
off accurately to three places of decimals, which is quite sufficient for 
most cases. With this value for if of a given set of measurements we 
can calculate the value for h, the Index of Precision, as is shown in 
equation (5) where K was put in place of 2 Log h — 0.3623; hence, 

K + 0.3623 
A =(10) 2 [6] 

Calculation of the Probable Error 

The simplest way of calculating the probable error is to take from a 
probability integral table the value for hz corresponding to the integral 
value Yi- This value for hx is 0.4769; hence, 

_ K + 0.3623 
a; = 0.4769(10) [7] 

We might of course draw a straight line through every "class" point 
parallel to the "best" straight line and so obtain a probable error for 
each class which, when meaned, would give an average probable error. 
However, in most eases the probable error obtained from the "best" 
straight line is more accurate. 

A more instructive method of calculating the probable error is to 
make a tracing of the theoretical polygon, which is constructed from the 
values read off on the straight line plot, on reasonably uniform tracing 
cloth and then carefully cutting out the area under this curve, rolling 
it up and finally weighing it on accurate balances. The polygon is then 
unrolled and folded along the mode exactly in two and trimmed along 
the sides parallel to the fold by means of a photographer's print trimmer 
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until it weighs exactly one-half of the original weight. Replacing now 
this trimmed tracing upon the original theoretical polygon, we may read 
off the probable error on the base of the polygon at the limit of the 
tracing. 

Calculation of the Probable Ebhoe of the Arithmetical Mean 

By means of the Principle of Least Squares it can be shown that the 

probable error of the arithmetical mean, x , is equal to the probable 

error (obtained from h) of one determination divided by the square 

root of the number of determinations, or, 

x 0.4769 , K + 0.3623 

a^> = — ;= = — 7^ (10J 2 

Vre VI 

TABLES OF RESULTS 

In the tables below are given in the first columns the number of 
determinations falling into each class, while in the last columns are 
given the values calculated by means of the straight lines for the con- 
struction of the theoretical polygons. The headings are self-explana- 
tory. The Roman numerals of each table correspond to the Roman 
numerals on the figures constructed from these tables. 
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0. 
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y 


0.2 

0.5 

1 

2 

5 

6 



Log^° 



+ 



1.477 
1.079 
0.778 
0.477 
0.079 
0.000 



Observed 



0.160 
0.040 



Xa 
Log m 



+1.60 
+0.80 
+1.40 
+1.00 



Calculated from K = 5 . 625 

to 



< L <>* 



0.6029 
0.44Q4 
0.3176 
0.1947 
0.0322 
0.0000 



Log 



Too 



±00 

0.7765 
0.6636 
0.5636 
0.4413 
0.1794 
0.0000 



Log TO 

±00 

+1.9765 
+0.4235 
+1.8636 
+0.5364 
+1.7636 
+0.6364 
+1.6413 
+0.7613 
+1.3794 
+1.0206 
+1.2000 



SUMMARY TABLE 

In the table following the Roman numerals in the first column refer 
either to the tables or to the figures themselves; in the second, third and 
fourth columns are given the means the probable errors and the pro- 
bable errors of the means calculated by the new method. In the fifth, 
sixth and seventh columns are given the means, the probable errors 
and the probable errors of the means taken from the literature listed 
at the end of this article. 



By the New Method 



Taken from the Literature 







Probable 






Probable 






Probable 


error 




Probable 


error 




Tables Mean 


error 


of the mean 


Mean 


error 


of the mean 




I 10.0 


0.68 


0.068 


10. 


0.60 


0.060 


w. &s. 


II 2.7 


0.47 


0.052 


2.7 


0.47 


0.052 


w. 


III 0.65 


0.24 


0.026 


0.7 


0.24 


0.026 


w. 


IV 137.6 


35.26 


1.12 


137.6 


37.0 


1.2 


B. &R. 


V 14.5 


0.97 


0.08 


14.5 


1.1 


0.087 


Wood 


VI 3.05 


0.140 


0.004 


3.07 


0.158 


0.004 


Collins 


X 15.0 


7.0 


2.0 








Science 


oThis value 


is erroneous. Apparently an arithmetical mistake. 








SECTION II 









One of the fundamental postulates of the law of probability of errors 
is that positive and negative errors are equally frequent. This however 
is not generally true. It is true for example in military statistics where 
the deviations from the arithmetic mean are small. Thus in measuring 
the heights of soldiers the maximum deviation from the mean is never 
more than about one foot, while the height of the shortest soldier is 
about five feet. But if we wish to ascertain say the average number of 
children per family in the United States the frequency curve shows that 
some families may have negative children. For if the average be four 
children per family, and we know that some families have as many as 
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five times that number, then, according to the above postulate the 
frequency curve must include the count zero and beyond. This is 
typical of a great many cases and is rather the rule than the exception. 
Let us now once more examine some of the figures of the previous 
section of this paper. It is not only conceivable but very likely that 
some of the samples of soil of which the nitrogen contents are mapped in 
figure 3 might have yielded no measurable amount of nitrogen. But the 
frequency curve indicates that some of the samples might have contained 
a quantity less than zero. The same is true of the yield of oranges 
mapped on figure IV and of the bacterial counts mapped on figure X. 
It must not however be concluded from this that the law of probability 
of errors does not apply to these cases. It is the particular form of 
the mathematical expression for the law of probability of errors which 
does not apply. We have therefore sought an equation of such form 
that it should satisfy the postulates of the law of probability of errors 
and also agree with experience. This equation is 

2U e -h*(Log^ [9] 

2/o 

where to denotes the numerical value of any measurement and wi the 
value of the geometric mean.* The meanings of y, y and h are the 
same as those of the same quantities in equation (2). Equation (9) 
states that it is as likely or rather as unlikely that some values for m 
be zero as + <*> ; that is, in either case y/y would equal zero. When 
m equals to , y/yo equals 1 ; that is, the maximum probability is attained 
when the measured values do not deviate from the value of the mean. 
Transforming equation (9) into a rectilinear one, as has been done with 
equation (2), we obtain 

Log^° = 2.303h 2 (Log-) 2 [10], 

y Too 

U.*-K<U,=f ,11,. 



Whence h, the index of precision, equals J K/2 . 303, [12] . 

We may now proceed with the construction of the experimental 
polygons with the values given in columns 1 and 4 of tables Ilia and 
Xa, then find the "best" values for K from the straight line equation 
(11) and finally construct the theoretical curves from the values given 
in columns 1 and 7 of tables Ilia and Xa. The curves so obtained are 
shown in figures Ilia and Xa. From these curves the probable errors 
may be calculated as described in section I. 

*For a mathematical discussion see Galton, and McAlister, Proc. Rov. Soc. Lond 
29:365 (1879). 
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SUMMARY 

In section I of this paper, the usual mathematical expression for the 
law of the probability of errors has been transformed into a rectilinear 
form. With the aid of this equation, the statistical criteria for various 
sets of data may be very accurately calculated without previously find- 
ing, squaring, and so on, of the individual residuals, and thus may be 
saved an enormous amount of time and labor. 

In section II, it is shown that the mathematical expression for the 
law of the probability of errors generally used holds only where the 
percentage deviations from the mean are small. A general equation is 
then developed, of which the former is but a special case. For when the 
percentage deviations from the mean are small, that is, when m is less 
than 2 m , where m denotes the value of any measurement and m the 
value of the mean, our general equation 

J =e -h*(Log^ 
may be expanded in series, thus: 

^•-'[(s-O-^Cs-O'^^- 1 )'--! 

Neglecting all terms but the first on the right-hand side, we obtain, 

a Y \ m ) 

which is identical with the ordinary law of the probability of errors 
generally used, and most often misused, for, as has been pointed out, 
this equation holds only where the percentage deviations from the mean 
are small. 

Transmitted April 29, 1920. 
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